Creating features with Animal Crossing data

DSST 289: Introduction to Data Science

Erik Fredner

2024-09-09

Overview

  • Animal Crossing 🐟 data
  • mutate()
  • if_else()
  • case_when()
  • exam review

🐟 = 💰

What’s the best strategy to get rich fishing?

Fishing in Animal Crossing

Animal Crossing fish data

name value location spawn_low spawn_high
giant trevally 4500 Pier 1 1
blowfish 5000 Sea 5 5
sturgeon 10000 River 1 4
angelfish 3000 River 2 5
olive flounder 800 Sea 4 6

mutateing spawn rates

  • Spawn rates tell us how likely we are to see a fish.
  • Higher values are more common than low values.
fish |>
  mutate(spawn_rate = (spawn_low + spawn_high) / 2) |>
  select(name, spawn_low, spawn_high, spawn_rate) |>
  arrange(desc(spawn_rate)) |>
  slice_head(n = 5)
# A tibble: 5 × 4
  name           spawn_low spawn_high spawn_rate
  <chr>              <dbl>      <dbl>      <dbl>
1 salmon                20         20       20  
2 pond smelt            18         20       19  
3 horse mackerel        14         21       17.5
4 bitterling            12         17       14.5
5 sea bass              11         18       14.5

mutate with if_else

fish |>
  mutate(
    spawn_rate = (spawn_low + spawn_high) / 2,
    # new:
    spawn_freq = if_else(spawn_rate > 10, "common", "rare")
  ) |>
  select(name, spawn_rate, spawn_freq) |>
  arrange(desc(spawn_rate)) |>
  slice_sample(n = 5)
# A tibble: 5 × 3
  name            spawn_rate spawn_freq
  <chr>                <dbl> <chr>     
1 bitterling            14.5 common    
2 sea butterfly         10.5 common    
3 ranchu goldfish        1.5 rare      
4 puffer fish            7.5 rare      
5 sea horse              6   rare      

Is common vs. rare enough?

fish |>
  mutate(
    spawn_rate = (spawn_low + spawn_high) / 2,
    spawn_freq = if_else(spawn_rate > 10, "common", "rare")
  ) |>
  ggplot(aes(x = spawn_rate, y = value, color = spawn_freq)) +
  geom_point() +
  geom_text_repel(aes(label = name))

Is common vs. rare enough?

case_when for multiple categories

fish <- fish |>
  mutate(
    spawn_rate = (spawn_low + spawn_high) / 2,
    spawn_freq = case_when(
      spawn_rate > 15 ~ "very common",
      spawn_rate > 10 ~ "common",
      spawn_rate > 5 ~ "uncommon",
      spawn_rate > 2 ~ "rare",
      spawn_rate <= 2 ~ "very rare",
      TRUE ~ "default"
    )
  )

What that looks like

fish |>
  arrange(desc(spawn_rate)) |>
  select(name, spawn_rate, spawn_freq) |>
  slice_sample(n = 5) |>
  kable()
name spawn_rate spawn_freq
anchovy 3.5 rare
ranchu goldfish 1.5 very rare
pike 1.5 very rare
whale shark 1.0 very rare
cherry salmon 6.0 uncommon

Valuable and common?

fish |>
  ggplot(aes(x = spawn_rate, y = value, color = spawn_freq)) +
  geom_point() +
  geom_text_repel(aes(label = name)) +
  # draw line from origin to max
  geom_segment(aes(x = 0,
                   y = 0,
                   xend = max(spawn_rate),
                   yend = max(value)),
               linetype = "dashed",
               color = "gray",
               alpha = 0.1)

Valuable and common?

Where should we fish?

fish |>
  # get rid of common or very common fish:
  filter(!spawn_freq %in% c("common", "very common")) |>
  # only keep fish above median value:
  filter(value > median(value)) |>
  ggplot(aes(x = spawn_rate, y = value, color = location)) +
  geom_point() +
  geom_text_repel(aes(label = name)) +
  geom_segment(aes(x = 0,
                   y = 0,
                   xend = max(spawn_rate),
                   yend = max(value)),
               linetype = "dashed",
               color = "gray",
               alpha = 0.1)

Where should we fish?

From spawn rate to spawn probability

fish |>
  group_by(location) |>
  mutate(spawn_prob = spawn_rate / sum(spawn_rate)) |>
  ungroup() |>
  select(name, spawn_rate, location, spawn_prob) |>
  slice_sample(n = 5)
# A tibble: 5 × 4
  name           spawn_rate location spawn_prob
  <chr>               <dbl> <chr>         <dbl>
1 arapaima              1   River       0.00539
2 sweetfish             7.5 River       0.0404 
3 butterfly fish        4.5 Sea         0.0312 
4 king salmon           5   River       0.0270 
5 pale chub             7.5 River       0.0404 

Sneak preview: simulation

Exam review